Search Results/Filters    

Filters

Year

Banks




Expert Group











Full-Text


Author(s): 

ALSUMAIT L. | DOMENICONI C.

Issue Info: 
  • Year: 

    2007
  • Volume: 

    -
  • Issue: 

    -
  • Pages: 

    0-0
Measures: 
  • Citations: 

    1
  • Views: 

    146
  • Downloads: 

    0
Keywords: 
Abstract: 

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 146

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesCitation 1 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesRefrence 0
Issue Info: 
  • Year: 

    2023
  • Volume: 

    35
  • Issue: 

    3
  • Pages: 

    981-1010
Measures: 
  • Citations: 

    0
  • Views: 

    53
  • Downloads: 

    10
Abstract: 

The present study aimed to designing a method for organizing Persian Text documents using the clustering technique. The data set related to theses and dissertations including 2943 researches was considered as a statistical population. Data were collected from a set of data related to scientific research, which included 5, 000 researches in Excel format. In this study, after converting the data into a structured format, the processing operation was performed using preprocessing operations. In the processing stage, the clustering technique was used to present the proposed algorithm in order to organize Persian Text documents. This algorithm was introduced by improving the K-means algorithm for document clustering. The results of the evaluation showed that the proposed algorithm based on external criteria had a positive effect on the clustering quality of documents compared to the two algorithms K-means and K-means++. So that the research of each designated category in the related subject cluster had a uniform distribution, and led to the achievement of the purpose of the present study. In the category/cluster tables obtained from the two algorithms K-means and K-means++, we saw a non-uniform distribution of research in clusters, so the evaluation based on internal criteria was affected by different cluster densities and inter-cluster similarity. The size of the dataset was also not affected by the proposed solutions for selecting the final dataset and the research process, so the proposed algorithm works well for the high dimensions of the feature.

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 53

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 10 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesCitation 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesRefrence 0
Issue Info: 
  • Year: 

    2019
  • Volume: 

    7
  • Issue: 

    3
  • Pages: 

    443-450
Measures: 
  • Citations: 

    0
  • Views: 

    204
  • Downloads: 

    90
Abstract: 

Text clustering and classification are two main tasks of Text mining. Feature selection plays a key role in the quality of the clustering and classification results. Although word-based features such as Term Frequency-Inverse Document Frequency (TF-IDF) vectors have been widely used in different applications, their shortcomings in capturing semantic concepts of Text have motivated researches to use semantic models for document vector representations. The Latent Dirichlet Allocation (LDA) topic modeling and doc2vec neural document embedding are two well-known techniques for this purpose. In this work, we first studied the conceptual difference between the two models and showed that they had different behaviors and capture semantic features of Texts from different perspectives. We then proposed a hybrid approach for document vector representation to benefit from the advantages of both models. The experimental results on 20newsgroup showed the superiority of the proposed model compared to each one of the baselines on both Text clustering and classification tasks. We achieved a 2. 6% improvement in F-measure for Text clustering and a 2. 1% improvement in F-measure in Text classification compared to the best baseline model.

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 204

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 90 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesCitation 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesRefrence 0
Issue Info: 
  • Year: 

    2022
  • Volume: 

    52
  • Issue: 

    3
  • Pages: 

    205-215
Measures: 
  • Citations: 

    0
  • Views: 

    136
  • Downloads: 

    23
Abstract: 

Distance-based clustering methods categorize samples by optimizing a global criterion, finding ellipsoid clusters with roughly equal sizes. In contrast, density-based clustering techniques form clusters with arbitrary shapes and sizes by optimizing a local criterion. Most of these methods have several hyper-parameters, and their performance is highly dependent on the hyper-parameter setup. Recently, a Gaussian Density Distance (GDD) approach was proposed to optimize local criteria in terms of distance and density properties of samples. GDD can find clusters with different shapes and sizes without any free parameters. However, it may fail to discover the appropriate clusters due to the interfering of clustered samples in estimating the density and distance properties of remaining unclustered samples. Here, we introduce Adaptive GDD (AGDD), which eliminates the inappropriate effect of clustered samples by adaptively updating the parameters during clustering. It is stable and can identify clusters with various shapes, sizes, and densities without adding extra parameters. The distance metrics calculating the dissimilarity between samples can affect the clustering performance. The effect of different distance measurements is also analyzed on the method. The experimental results conducted on several well-known datasets show the effectiveness of the proposed AGDD method compared to the other well-known clustering methods.

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 136

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 23 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesCitation 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesRefrence 0
Issue Info: 
  • Year: 

    2025
  • Volume: 

    23
  • Issue: 

    80
  • Pages: 

    307-324
Measures: 
  • Citations: 

    0
  • Views: 

    17
  • Downloads: 

    0
Abstract: 

Text clustering is a method for separating specific information from Textual data and can even classify Text according to topic and sentiment, which has drawn much interest in recent years. Deep clustering methods are especially important among clustering techniques because of their high accuracy. These methods include two main components: dimensionality reduction and clustering. Many earlier efforts have employed autoencoder for dimension reduction; however, they are unable to lower dimensions based on manifold structures, and samples that are like one another are not necessarily placed next to one another in the low dimensional. In the paper, we develop a Deep Text clustering method based on a local Manifold in the Autoencoder layer (DCTMA) that employs multiple similarity matrices to obtain manifold information, such that this final similarity matrix is obtained from the average of these matrices. The obtained matrix is added to the bottleneck representation layer in the autoencoder. The DCTMA's main goal is to generate similar representations for samples belonging to the same cluster; after dimensionality reduction is achieved with high accuracy, clusters are detected using an end-to-end deep clustering. Experimental results demonstrate that the suggested method performs surprisingly well in comparison to current state-of-the-art methods in Text datasets.

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 17

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesCitation 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesRefrence 0
Issue Info: 
  • Year: 

    2019
  • Volume: 

    11
  • Issue: 

    4
  • Pages: 

    58-65
Measures: 
  • Citations: 

    0
  • Views: 

    211
  • Downloads: 

    73
Abstract: 

Text classification has a wide range of applications such as: spam filtering, automated indexing of scientific articles, identifying the genre of documents, news monitoring, and so on. Text datasets usually contain much irrelevant and noisy information which eventually reduces the efficiency and cost of their classification. Therefore, for effective Text classification, feature selection methods are widely used to handle the high dimensionality of data. In this paper, a novel feature selection method based on the combination of information gain and FAST algorithm is proposed. In our proposed method, at first, the information gain is calculated for the features and those with higher information gain are selected. The FAST algorithm is then used on the selected features which uses graph-theoretic clustering methods. To evaluate the performance of the proposed method, we carry out experiments on three Text datasets and compare our algorithm with several feature selection techniques. The results confirm that the proposed method produces smaller feature subset in shorter time. In addition, the evaluation of a K-nearest neighborhood classifier on validation data show that, the novel algorithm gives higher classification accuracy.

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 211

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 73 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesCitation 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesRefrence 0
Author(s): 

Journal: 

Natural Hazards

Issue Info: 
  • Year: 

    2022
  • Volume: 

    111
  • Issue: 

    1
  • Pages: 

    0-0
Measures: 
  • Citations: 

    1
  • Views: 

    18
  • Downloads: 

    0
Keywords: 
Abstract: 

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 18

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesCitation 1 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesRefrence 0
Issue Info: 
  • Year: 

    2022
  • Volume: 

    14
  • Issue: 

    4
  • Pages: 

    657-676
Measures: 
  • Citations: 

    0
  • Views: 

    283
  • Downloads: 

    0
Abstract: 

Mobile applications (mobile apps) are expanding rapidly in the age of modern economics. At the same time, a new paradigm has been formed in the field of e-commerce called social commerce which depends significantly on the platform of mobile apps. The rapid development and growth of mobile computing, smartphones, and Web 2. 0 technology have facilitated social commerce. This study seeks to find components related to the proposed business model of Osterwalder and Pigneur in mobile social commerce apps using Text mining (k-means clustering algorithm) of previous studies. To collect the body of Text mining, abstracts and keywords of 2913 articles on mobile app, social commerce and business model were collected from Scopus repository, and after conducting the pre-processing steps, these articles were clustered into seven clusters. Later by analyzing the articles of each cluster and identifying their subject, a mapping of these topics was created with six components of the Osterwalder business model. These six components included key activities, key resources, key partnerships, value propositions, channels, and customer relationships. Finally, suggestions are made for conducting research on the other three missing components of the business model, including customer segmentation, revenue streams, and cost structure.

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 283

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesCitation 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesRefrence 0
Issue Info: 
  • Year: 

    2019
  • Volume: 

    11
  • Issue: 

    1
  • Pages: 

    27-35
Measures: 
  • Citations: 

    0
  • Views: 

    189
  • Downloads: 

    115
Abstract: 

Semantic relations between words like synsets are used in automatic ontology production which is a strong tool in many NLP tasks. Synset extraction is usually dependent on other languages and resources using techniques such as mapping or translation. In our proposed method, synsets are extracted merely from Text and corpora. This frees us from the need for special resources including Word-Nets or dictionaries. The representation model for words of corpus is based on Vector Space model and the most similar words to each are extracted based on common features count (CFC) using a modified cosine similarity measure. Furthermore, a graph-based soft clustering approach is applied to create clusters of synonymous words. To examine performance of the proposed method, Extracted synsets were compared to other Persian semantic resources. Results show an accuracy of 80. 25%, which indicates improvement in comparison to the 69. 5% accuracy of pure clustering by committee method.

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 189

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 115 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesCitation 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesRefrence 0
Issue Info: 
  • Year: 

    2019
  • Volume: 

    16
  • Issue: 

    2
  • Pages: 

    14-29
Measures: 
  • Citations: 

    0
  • Views: 

    162
  • Downloads: 

    134
Abstract: 

The emergence of digital information era and rapid development of the Internet makes information to change gradually from paper form to the electronic one. This makes the users capable to search the news and books in an electronic way. Thus, the existence of systems for information retrieval appears to be essential. This paper suggests a system for Text classification by means of semi-supervised fuzzy clustering with a weighted feature vector. In the proposed method, after a preprocessing phase, a Genetic Algorithm together with the TF-IDF method is used for dimensionality reduction. Accordingly, features with highest discriminating power are chosen and finally, the documents are classified with the clustering algorithm, C-W-FCM. In fact, the proposed clustering algorithm applies the Euclidean distance with different weights for different dimensions. For evaluation of the proposed approach, a number of prominent criteria for clustering, namely Fukuyama and Sugeno (FS), are used conducted on the Reuters dataset. It is assumed that a small number of documents have labels which are called the seeded set. Simulation results show that the proposed approach is 27 to 33% superior to conventional clustering algorithms based on the evaluation criteria in determining clusters. In addition, the proposed clustering algorithm increases the system effectiveness especially when documents are highly similar to each other.

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 162

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 134 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesCitation 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesRefrence 0
litScript
telegram sharing button
whatsapp sharing button
linkedin sharing button
twitter sharing button
email sharing button
email sharing button
email sharing button
sharethis sharing button